OPTIMIZATION OF K-NEAREST NEIGHBOUR TO CATEGORIZE INDONESIAN'S NEWS ARTICLES
نویسندگان
چکیده
Text classification is the process of grouping documents based on similarity in categories. Some obstacles doing text are many words appeared text, and some come up with infrequent frequency (sparse words). The way to solve this problem conduct feature selection process. There several filter-based methods; Chi-Square, Information Gain, Genetic Algorithm, Particle Swarm Optimization (PSO). Aghdam's research shows that PSO best among those methods. This study examined optimize k-Nearest Neighbour (k-NN) algorithm's performance categorizing news articles. k-NN an algorithm simple easy implement. If we use appropriate features, then will be a reliable algorithm. used select keywords (term features), it continued classifying using k-NN. testing consists three stages. stages tuning parameter k-NN, PSO, measuring performance. aims determine number neighbours particles. Otherwise, compares without PSO. optimal 9, particles 50. showed 50% reduction terms. results 20 per cent better accuracy than Although PSO's did not always find conditions, method can produce accuracy. In way, work articles, especially Indonesian language articles
منابع مشابه
k-Nearest Neighbour Classifiers
Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier – classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance today because issues of poor run-time performance is not such...
متن کاملIntroduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction
Suppose a bank has a database of people’s details and their credit rating. These details would probably be the person’s financial characteristics such as how much they earn, whether they own or rent a house, and so on, and would be used to calculate the person’s credit rating. However, the process for calculating the credit rating from the person’s details is quite expensive, so the bank would ...
متن کاملConvergence of random k-nearest-neighbour imputation
Random k-nearest-neighbour (RKNN) imputation is an established algorithm for filling in missing values in data sets. Assume that data are missing in a random way, so that missingness is independent of unobserved values (MAR), and assume there is a minimum positive probability of a response vector being complete. Then RKNN, with k equal to the square root of the sample size, asymptotically produ...
متن کاملCONNECTIVITY OF RANDOM k-NEAREST-NEIGHBOUR GRAPHS
LetP be a Poisson process of intensity one in a squareSn of arean. We construct a random geometric graph Gn,k by joining each point of P to its k ≡ k(n) nearest neighbours. Recently, Xue and Kumar proved that if k ≤ 0.074 log n then the probability that Gn,k is connected tends to 0 as n → ∞ while, if k ≥ 5.1774 log n, then the probability that Gn,k is connected tends to 1 as n → ∞. They conject...
متن کاملSmall components in k-nearest neighbour graphs
Let G = Gn,k denote the graph formed by placing points in a square of area n according to a Poisson process of density 1 and joining each point to its k nearest neighbours. In [2] Balister, Bollobás, Sarkar and Walters proved that if k < 0.3043 logn then the probability that G is connected tends to 0, whereas if k > 0.5139 logn then the probability that G is connected tends to 1. We prove that,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Asia-Pacific Journal of Information Technology and Multimedia
سال: 2021
ISSN: ['2289-2192']
DOI: https://doi.org/10.17576/apjitm-2021-1001-04